智能论文笔记

PASTA-GAN++: A Versatile Framework for High-Resolution Unpaired Virtual Try-on

Zhenyu Xie , Zaiyu Huang , Fuwei Zhao , Haoye Dong , Michael Kampffmeyer , Xin Dong , Feida Zhu , Xiaodan Liang

分类：计算机视觉

2022-07-27

基于图像的虚拟试验是以人为中心的现实潜力，是以人为中心的图像生成的最有希望的应用之一。在这项工作中，我们迈出了一步，探索多功能的虚拟尝试解决方案，我们认为这应该具有三个主要属性，即，它们应支持无监督的培训，任意服装类别和可控的服装编辑。为此，我们提出了一个特征性的端到端网络，即用空间自适应的斑点适应性GAN ++（Pasta-gan ++），以实现用于高分辨率不合规的虚拟试验的多功能系统。具体而言，我们的意大利面++由一个创新的贴布贴片的拆卸模块组成，可以将完整的服装切换为归一化贴剂，该贴片能够保留服装样式信息，同时消除服装空间信息，从而减轻在未受监督训练期间过度适应的问题。此外，面食++引入了基于贴片的服装表示和一个贴片引导的解析合成块，使其可以处理任意服装类别并支持本地服装编辑。最后，为了获得具有逼真的纹理细节的尝试结果，面食gan ++结合了一种新型的空间自适应残留模块，以将粗翘曲的服装功能注入发电机。对我们新收集的未配对的虚拟试验（UPT）数据集进行了广泛的实验，证明了面食gan ++比现有SOTA的优越性及其可控服装编辑的能力。

translated by 谷歌翻译

Towards Scalable Unpaired Virtual Try-On via Patch-Routed Spatially-Adaptive GAN

Zhenyu Xie , Zaiyu Huang , Fuwei Zhao , Haoye Dong , Michael Kampffmeyer , Xiaodan Liang

分类：计算机视觉

2021-11-20

基于图像的虚拟试图是由于其巨大的真实潜力，以人为本的图像生成最有希望的应用之一。然而，由于大多数预先接近店内服装到目标人物，他们需要对成对的训练数据集进行费力和限制性的结构，严重限制了它们的可扩展性。虽然最近的一些作品试图直接从一个人转移服装，但减轻了收集配对数据集的需要，它们的表现受缺乏配对（监督）信息影响。特别地，衣服的解开样式和空间信息成为一个挑战，通过需要辅助数据或广泛的在线优化程序来解决任何方法，从而仍抑制其可扩展性。实现A \ EMPH {可扩展}虚拟试样系统，可以以无监督的方式在源和目标人物之间传输任意服装，因此我们提出了一种纹理保留的端到端网络，该包装空间 - 适应甘（意大利面），促进了现实世界的未配对虚拟试验。具体而言，要解开每位服装的风格和空间信息，意大利面甘包括一个创新的补丁路由解剖模块，用于成功挡住衣服纹理和形状特性。由源人关键点引导，修补程序路由的解剖学模块首先将衣服脱发到标准化的贴片中，从而消除了衣服的固有空间信息，然后将归一化贴片重建到符合目标人员姿势的翘曲衣服。鉴于翘曲的衣服，Pasta-GaN进一步推出了一种新型空间适应性的残余块，指导发电机合成更现实的服装细节。

translated by 谷歌翻译

Cross Attention-guided Dense Network for Images Fusion

Zhengwen Shen , Jun Wang , Zaiyu Pan , Yulian Li , Jiangyu Wang

分类：计算机视觉

2021-09-23

近年来，基于深度学习，各种计算机视觉应用已取得了重大进展，该进展已被广泛用于图像融合，并证明可以实现足够的性能。然而，对于不同源图像的空间对应关系的能力有限，对于现有的无监督图像融合模型的挑战仍然是一个巨大的挑战，即提取适当的功能并实现适应性和平衡的融合。在本文中，我们提出了一个新颖的跨注意指导图像融合网络，该网络是多模式图像融合，多曝光图像融合和多聚焦图像融合的统一且无监督的框架。与现有的自我发项模块不同，我们的交叉意见模块着重于建模不同源图像之间的互相关。使用拟议的交叉注意模块作为核心块，建立一个密集连接的交叉注意引导网络是为了动态地学习空间对应，以从不同的输入图像中获得更好的重要细节。同时，还设计了一个辅助分支来对远程信息进行建模，并附加了合并网络以最终重建融合图像。在公开可用的数据集上进行了广泛的实验，结果表明，所提出的模型在定量和质量上优于最先进的模型。

translated by 谷歌翻译

Rethinking Mobile Block for Efficient Neural Models

Jiangning Zhang , Xiangtai Li , Jian Li , Liang Liu , Zhucun Xue , Boshen Zhang , Zhengkai Jiang , Tianxin Huang , Yabiao Wang , Chengjie Wang

分类：计算机视觉

2023-01-03

This paper focuses on designing efficient models with low parameters and FLOPs for dense predictions. Even though CNN-based lightweight methods have achieved stunning results after years of research, trading-off model accuracy and constrained resources still need further improvements. This work rethinks the essential unity of efficient Inverted Residual Block in MobileNetv2 and effective Transformer in ViT, inductively abstracting a general concept of Meta-Mobile Block, and we argue that the specific instantiation is very important to model performance though sharing the same framework. Motivated by this phenomenon, we deduce a simple yet efficient modern \textbf{I}nverted \textbf{R}esidual \textbf{M}obile \textbf{B}lock (iRMB) for mobile applications, which absorbs CNN-like efficiency to model short-distance dependency and Transformer-like dynamic modeling capability to learn long-distance interactions. Furthermore, we design a ResNet-like 4-phase \textbf{E}fficient \textbf{MO}del (EMO) based only on a series of iRMBs for dense applications. Massive experiments on ImageNet-1K, COCO2017, and ADE20K benchmarks demonstrate the superiority of our EMO over state-of-the-art methods, \eg, our EMO-1M/2M/5M achieve 71.5, 75.1, and 78.4 Top-1 that surpass \textbf{SoTA} CNN-/Transformer-based models, while trading-off the model accuracy and efficiency well.

translated by 谷歌翻译

PIE-QG: Paraphrased Information Extraction for Unsupervised Question Generation from Small Corpora

Dinesh Nagumothu , Bahadorreza Ofoghi , Guangyan Huang , Peter W. Eklund

分类：自然语言处理 | 人工智能

2023-01-03

Supervised Question Answering systems (QA systems) rely on domain-specific human-labeled data for training. Unsupervised QA systems generate their own question-answer training pairs, typically using secondary knowledge sources to achieve this outcome. Our approach (called PIE-QG) uses Open Information Extraction (OpenIE) to generate synthetic training questions from paraphrased passages and uses the question-answer pairs as training data for a language model for a state-of-the-art QA system based on BERT. Triples in the form of <subject, predicate, object> are extracted from each passage, and questions are formed with subjects (or objects) and predicates while objects (or subjects) are considered as answers. Experimenting on five extractive QA datasets demonstrates that our technique achieves on-par performance with existing state-of-the-art QA systems with the benefit of being trained on an order of magnitude fewer documents and without any recourse to external reference data sources.

translated by 谷歌翻译

A New Perspective to Boost Vision Transformer for Medical Image Classification

Yuexiang Li , Yawen Huang , Nanjun He , Kai Ma , Yefeng Zheng

分类：计算机视觉 | 人工智能

2023-01-03

Transformer has achieved impressive successes for various computer vision tasks. However, most of existing studies require to pretrain the Transformer backbone on a large-scale labeled dataset (e.g., ImageNet) for achieving satisfactory performance, which is usually unavailable for medical images. Additionally, due to the gap between medical and natural images, the improvement generated by the ImageNet pretrained weights significantly degrades while transferring the weights to medical image processing tasks. In this paper, we propose Bootstrap Own Latent of Transformer (BOLT), a self-supervised learning approach specifically for medical image classification with the Transformer backbone. Our BOLT consists of two networks, namely online and target branches, for self-supervised representation learning. Concretely, the online network is trained to predict the target network representation of the same patch embedding tokens with a different perturbation. To maximally excavate the impact of Transformer from limited medical data, we propose an auxiliary difficulty ranking task. The Transformer is enforced to identify which branch (i.e., online/target) is processing the more difficult perturbed tokens. Overall, the Transformer endeavours itself to distill the transformation-invariant features from the perturbed tokens to simultaneously achieve difficulty measurement and maintain the consistency of self-supervised representations. The proposed BOLT is evaluated on three medical image processing tasks, i.e., skin lesion classification, knee fatigue fracture grading and diabetic retinopathy grading. The experimental results validate the superiority of our BOLT for medical image classification, compared to ImageNet pretrained weights and state-of-the-art self-supervised learning approaches.

translated by 谷歌翻译

Analogical Inference Enhanced Knowledge Graph Embedding

Yao Zhen , Zhang Wen , Chen Mingyang , Huang Yufeng , Yang Yi , Chen Huajun

分类：人工智能 | 自然语言处理

2023-01-03

Knowledge graph embedding (KGE), which maps entities and relations in a knowledge graph into continuous vector spaces, has achieved great success in predicting missing links in knowledge graphs. However, knowledge graphs often contain incomplete triples that are difficult to inductively infer by KGEs. To address this challenge, we resort to analogical inference and propose a novel and general self-supervised framework AnKGE to enhance KGE models with analogical inference capability. We propose an analogical object retriever that retrieves appropriate analogical objects from entity-level, relation-level, and triple-level. And in AnKGE, we train an analogy function for each level of analogical inference with the original element embedding from a well-trained KGE model as input, which outputs the analogical object embedding. In order to combine inductive inference capability from the original KGE model and analogical inference capability enhanced by AnKGE, we interpolate the analogy score with the base model score and introduce the adaptive weights in the score function for prediction. Through extensive experiments on FB15k-237 and WN18RR datasets, we show that AnKGE achieves competitive results on link prediction task and well performs analogical inference.

translated by 谷歌翻译

Digital Engineering Transformation with Trustworthy AI towards Industry 4.0: Emerging Paradigm Shifts

Jingwei Huang

分类：人工智能

2023-01-03

Digital engineering transformation is a crucial process for the engineering paradigm shifts in the fourth industrial revolution (4IR), and artificial intelligence (AI) is a critical enabling technology in digital engineering transformation. This article discusses the following research questions: What are the fundamental changes in the 4IR? More specifically, what are the fundamental changes in engineering? What is digital engineering? What are the main uncertainties there? What is trustworthy AI? Why is it important today? What are emerging engineering paradigm shifts in the 4IR? What is the relationship between the data-intensive paradigm and digital engineering transformation? What should we do for digitalization? From investigating the pattern of industrial revolutions, this article argues that ubiquitous machine intelligence (uMI) is the defining power brought by the 4IR. Digitalization is a condition to leverage ubiquitous machine intelligence. Digital engineering transformation towards Industry 4.0 has three essential building blocks: digitalization of engineering, leveraging ubiquitous machine intelligence, and building digital trust and security. The engineering design community at large is facing an excellent opportunity to bring the new capabilities of ubiquitous machine intelligence and trustworthy AI principles, as well as digital trust, together in various engineering systems design to ensure the trustworthiness of systems in Industry 4.0.

translated by 谷歌翻译

Human-in-the-loop Embodied Intelligence with Interactive Simulation Environment for Surgical Robot Learning

Yonghao Long , Wang Wei , Tao Huang , Yuehao Wang , Qi Dou

分类：机器人 | 人工智能 | 计算机视觉 | 机器学习

2023-01-01

Surgical robot automation has attracted increasing research interest over the past decade, expecting its huge potential to benefit surgeons, nurses and patients. Recently, the learning paradigm of embodied AI has demonstrated promising ability to learn good control policies for various complex tasks, where embodied AI simulators play an essential role to facilitate relevant researchers. However, existing open-sourced simulators for surgical robot are still not sufficiently supporting human interactions through physical input devices, which further limits effective investigations on how human demonstrations would affect policy learning. In this paper, we study human-in-the-loop embodied intelligence with a new interactive simulation platform for surgical robot learning. Specifically, we establish our platform based on our previously released SurRoL simulator with several new features co-developed to allow high-quality human interaction via an input device. With these, we further propose to collect human demonstrations and imitate the action patterns to achieve more effective policy learning. We showcase the improvement of our simulation environment with the designed new features and tasks, and validate state-of-the-art reinforcement learning algorithms using the interactive environment. Promising results are obtained, with which we hope to pave the way for future research on surgical embodied intelligence. Our platform is released and will be continuously updated in the website: https://med-air.github.io/SurRoL/

translated by 谷歌翻译

Conditional Diffusion Based on Discrete Graph Structures for Molecular Graph Generation

Han Huang , Leilei Sun , Bowen Du , Weifeng Lv

分类：机器学习

2023-01-01

Learning the underlying distribution of molecular graphs and generating high-fidelity samples is a fundamental research problem in drug discovery and material science. However, accurately modeling distribution and rapidly generating novel molecular graphs remain crucial and challenging goals. To accomplish these goals, we propose a novel Conditional Diffusion model based on discrete Graph Structures (CDGS) for molecular graph generation. Specifically, we construct a forward graph diffusion process on both graph structures and inherent features through stochastic differential equations (SDE) and derive discrete graph structures as the condition for reverse generative processes. We present a specialized hybrid graph noise prediction model that extracts the global context and the local node-edge dependency from intermediate graph states. We further utilize ordinary differential equation (ODE) solvers for efficient graph sampling, based on the semi-linear structure of the probability flow ODE. Experiments on diverse datasets validate the effectiveness of our framework. Particularly, the proposed method still generates high-quality molecular graphs in a limited number of steps.

translated by 谷歌翻译